Machine Translation and Automatic Language Data Processing

نویسنده

LÉON E. DOSTERT

چکیده

This chapter discusses machine or automatic translation of natural languages. It reviews the status of the art at present; explains its basic operations, methods and procedures; indicates its objectives and uses, and situates machine translation or MT in the general field of automatic language data processing. Finally, it suggests its possible role in language communication as a whole. Machine translation is a relatively new area of automatic language data processing. It came about in part as a result of the conjuncture of three trends: (1) the development of structuralist procedures in linguistics; (2) the increasing sophistication of programming techniques, and (3) the growing capabilities and versatility of computation devices. It also became a subject of interest in the scientific and managerial communities as a result of the increasing volume and diversity of scientific and technical writings in the several languages of scientifically creative cultures, and the lengthening lag between the publication of information in a given language and its accessibility in one or several other languages. A decade ago machine translation was of interest to a relatively small group of people coming from such apparently unrelated fields as philosophy, physics, mathematics, sociology, logic, computational engineering, chemistry, and of course linguistics and languages. This diversity of background among the early comers was to bring about a widely diversified and divergent set of notions as to what automatic translation is or should be, what it ought to try to do, how and why it should do it. Notwithstanding these divergences, MT research today is pursued in a number of centers and laboratories in some twenty countries, including besides the United States, where oriented research may be said to have originated, Great Britain, the U.S.S.R., Japan, Italy, France, Belgium, Germany, and others. The first public demonstration of feasibility was carried out jointly by Georgetown University and IBM in January, 1954, on the basis of an experiment for the transfer of a small corpus of Russian

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

سیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی

Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...

متن کامل

برچسب‌زنی خودکار نقش‌های معنایی در جملات فارسی به کمک درخت‌های وابستگی

Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and th...

متن کامل

Morpho-Syntax Based Statistical Methods for Automatic Sign Language Translation

We present a novel approach for the automatic translation of written text into sign language. A new corpus focussing on the weather report domain for the language pair German and German Sign Language is introduced. We apply phrase-based statistical machine translation, enhanced by preand post-processing steps based on the morpho-syntactical analysis of German. Detailed results are given based o...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Machine Translation and Automatic Language Data Processing

نویسنده

چکیده

منابع مشابه

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

A new model for persian multi-part words edition based on statistical machine translation

Corpus based coreference resolution for Farsi text

سیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی

برچسب‌زنی خودکار نقش‌های معنایی در جملات فارسی به کمک درخت‌های وابستگی

Morpho-Syntax Based Statistical Methods for Automatic Sign Language Translation

عنوان ژورنال:

اشتراک گذاری